Dataset statistics
| Dataset A | Dataset B | |
|---|---|---|
| Number of variables | 12 | 12 |
| Number of observations | 446 | 446 |
| Missing cells | 449 | 444 |
| Missing cells (%) | 8.4% | 8.3% |
| Duplicate rows | 0 | 0 |
| Duplicate rows (%) | 0.0% | 0.0% |
| Total size in memory | 45.3 KiB | 45.3 KiB |
| Average record size in memory | 104.0 B | 104.0 B |
Variable types
| Dataset A | Dataset B | |
|---|---|---|
| Numeric | 5 | 5 |
| Categorical | 4 | 4 |
| Text | 3 | 3 |
| Dataset A | Dataset B | |
|---|---|---|
Survived is highly overall correlated with Sex | Survived is highly overall correlated with Sex | High Correlation |
Sex is highly overall correlated with Survived | Sex is highly overall correlated with Survived | High Correlation |
Age has 99 (22.2%) missing values | Age has 93 (20.9%) missing values | Missing |
Cabin has 350 (78.5%) missing values | Cabin has 351 (78.7%) missing values | Missing |
PassengerId has unique values | PassengerId has unique values | Unique |
Name has unique values | Name has unique values | Unique |
SibSp has 306 (68.6%) zeros | SibSp has 312 (70.0%) zeros | Zeros |
Parch has 346 (77.6%) zeros | Parch has 347 (77.8%) zeros | Zeros |
Fare has 6 (1.3%) zeros | Fare has 6 (1.3%) zeros | Zeros |
| Alert not present in this dataset | Fare is highly overall correlated with Pclass | High Correlation |
| Alert not present in this dataset | Pclass is highly overall correlated with Fare | High Correlation |
Reproduction
| Dataset A | Dataset B | |
|---|---|---|
| Analysis started | 2023-06-21 12:53:43.621477 | 2023-06-21 12:53:48.247868 |
| Analysis finished | 2023-06-21 12:53:48.246091 | 2023-06-21 12:53:52.634410 |
| Duration | 4.62 seconds | 4.39 seconds |
| Software version | ydata-profiling v0.0.dev0 | ydata-profiling v0.0.dev0 |
| Download configuration | config.json | config.json |
PassengerId
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 446 | 446 |
| Distinct (%) | 100.0% | 100.0% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 435.98655 | 454.1861 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 1 | 2 |
| Maximum | 891 | 890 |
| Zeros | 0 | 0 |
| Zeros (%) | 0.0% | 0.0% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 1 | 2 |
| 5-th percentile | 46.5 | 47.25 |
| Q1 | 204.5 | 240.25 |
| median | 420.5 | 456.5 |
| Q3 | 660 | 669.75 |
| 95-th percentile | 855 | 854.75 |
| Maximum | 891 | 890 |
| Range | 890 | 888 |
| Interquartile range (IQR) | 455.5 | 429.5 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 260.29785 | 258.15885 |
| Coefficient of variation (CV) | 0.59703184 | 0.56839883 |
| Kurtosis | -1.1940922 | -1.1837863 |
| Mean | 435.98655 | 454.1861 |
| Median Absolute Deviation (MAD) | 225.5 | 215.5 |
| Skewness | 0.087904493 | -0.038801394 |
| Sum | 194450 | 202567 |
| Variance | 67754.971 | 66645.99 |
| Monotonicity | Not monotonic | Not monotonic |
| Value | Count | Frequency (%) |
| 819 | 1 | 0.2% |
| 546 | 1 | 0.2% |
| 864 | 1 | 0.2% |
| 543 | 1 | 0.2% |
| 318 | 1 | 0.2% |
| 420 | 1 | 0.2% |
| 49 | 1 | 0.2% |
| 512 | 1 | 0.2% |
| 505 | 1 | 0.2% |
| 499 | 1 | 0.2% |
| Other values (436) | 436 |
| Value | Count | Frequency (%) |
| 400 | 1 | 0.2% |
| 97 | 1 | 0.2% |
| 191 | 1 | 0.2% |
| 121 | 1 | 0.2% |
| 465 | 1 | 0.2% |
| 71 | 1 | 0.2% |
| 746 | 1 | 0.2% |
| 329 | 1 | 0.2% |
| 847 | 1 | 0.2% |
| 759 | 1 | 0.2% |
| Other values (436) | 436 |
| Value | Count | Frequency (%) |
| 1 | 1 | |
| 6 | 1 | |
| 7 | 1 | |
| 9 | 1 | |
| 10 | 1 | |
| 12 | 1 | |
| 14 | 1 | |
| 15 | 1 | |
| 17 | 1 | |
| 21 | 1 |
| Value | Count | Frequency (%) |
| 2 | 1 | |
| 3 | 1 | |
| 4 | 1 | |
| 5 | 1 | |
| 7 | 1 | |
| 8 | 1 | |
| 10 | 1 | |
| 14 | 1 | |
| 15 | 1 | |
| 16 | 1 |
| Value | Count | Frequency (%) |
| 2 | 1 | |
| 3 | 1 | |
| 4 | 1 | |
| 5 | 1 | |
| 7 | 1 | |
| 8 | 1 | |
| 10 | 1 | |
| 14 | 1 | |
| 15 | 1 | |
| 16 | 1 |
| Value | Count | Frequency (%) |
| 1 | 1 | |
| 6 | 1 | |
| 7 | 1 | |
| 9 | 1 | |
| 10 | 1 | |
| 12 | 1 | |
| 14 | 1 | |
| 15 | 1 | |
| 17 | 1 | |
| 21 | 1 |
Survived
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 2 | 2 |
| Distinct (%) | 0.4% | 0.4% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
| 0 | |
|---|---|
| 1 |
| 0 | |
|---|---|
| 1 |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 1 | 1 |
| Median length | 1 | 1 |
| Mean length | 1 | 1 |
| Min length | 1 | 1 |
Characters and Unicode
| Dataset A | Dataset B | |
|---|---|---|
| Total characters | 446 | 446 |
| Distinct characters | 2 | 2 |
| Distinct categories | 1 | 1 ? |
| Distinct scripts | 1 | 1 ? |
| Distinct blocks | 1 | 1 ? |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | 0 | 1 |
| 2nd row | 1 | 0 |
| 3rd row | 0 | 1 |
| 4th row | 1 | 0 |
| 5th row | 0 | 0 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 271 | |
| 1 | 175 |
| Value | Count | Frequency (%) |
| 0 | 276 | |
| 1 | 170 |
Length
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| 0 | 271 | |
| 1 | 175 |
| Value | Count | Frequency (%) |
| 0 | 276 | |
| 1 | 170 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 271 | |
| 1 | 175 |
| Value | Count | Frequency (%) |
| 0 | 276 | |
| 1 | 170 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 446 |
| Value | Count | Frequency (%) |
| Decimal Number | 446 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 271 | |
| 1 | 175 |
| Value | Count | Frequency (%) |
| 0 | 276 | |
| 1 | 170 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 446 |
| Value | Count | Frequency (%) |
| Common | 446 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 271 | |
| 1 | 175 |
| Value | Count | Frequency (%) |
| 0 | 276 | |
| 1 | 170 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 446 |
| Value | Count | Frequency (%) |
| ASCII | 446 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 271 | |
| 1 | 175 |
| Value | Count | Frequency (%) |
| 0 | 276 | |
| 1 | 170 |
Pclass
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 3 | 3 |
| Distinct (%) | 0.7% | 0.7% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
| 3 | |
|---|---|
| 1 | |
| 2 |
| 3 | |
|---|---|
| 1 | |
| 2 |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 1 | 1 |
| Median length | 1 | 1 |
| Mean length | 1 | 1 |
| Min length | 1 | 1 |
Characters and Unicode
| Dataset A | Dataset B | |
|---|---|---|
| Total characters | 446 | 446 |
| Distinct characters | 3 | 3 |
| Distinct categories | 1 | 1 ? |
| Distinct scripts | 1 | 1 ? |
| Distinct blocks | 1 | 1 ? |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | 3 | 2 |
| 2nd row | 2 | 3 |
| 3rd row | 3 | 3 |
| 4th row | 3 | 2 |
| 5th row | 3 | 3 |
Common Values
| Value | Count | Frequency (%) |
| 3 | 249 | |
| 1 | 108 | |
| 2 | 89 | 20.0% |
| Value | Count | Frequency (%) |
| 3 | 257 | |
| 1 | 103 | |
| 2 | 86 | 19.3% |
Length
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| 3 | 249 | |
| 1 | 108 | |
| 2 | 89 | 20.0% |
| Value | Count | Frequency (%) |
| 3 | 257 | |
| 1 | 103 | |
| 2 | 86 | 19.3% |
Most occurring characters
| Value | Count | Frequency (%) |
| 3 | 249 | |
| 1 | 108 | |
| 2 | 89 | 20.0% |
| Value | Count | Frequency (%) |
| 3 | 257 | |
| 1 | 103 | |
| 2 | 86 | 19.3% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 446 |
| Value | Count | Frequency (%) |
| Decimal Number | 446 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 3 | 249 | |
| 1 | 108 | |
| 2 | 89 | 20.0% |
| Value | Count | Frequency (%) |
| 3 | 257 | |
| 1 | 103 | |
| 2 | 86 | 19.3% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 446 |
| Value | Count | Frequency (%) |
| Common | 446 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 3 | 249 | |
| 1 | 108 | |
| 2 | 89 | 20.0% |
| Value | Count | Frequency (%) |
| 3 | 257 | |
| 1 | 103 | |
| 2 | 86 | 19.3% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 446 |
| Value | Count | Frequency (%) |
| ASCII | 446 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 3 | 249 | |
| 1 | 108 | |
| 2 | 89 | 20.0% |
| Value | Count | Frequency (%) |
| 3 | 257 | |
| 1 | 103 | |
| 2 | 86 | 19.3% |
Name
['Text', 'Text']
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 446 | 446 |
| Distinct (%) | 100.0% | 100.0% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 82 | 65 |
| Median length | 49 | 47 |
| Mean length | 26.860987 | 26.697309 |
| Min length | 14 | 12 |
Characters and Unicode
| Dataset A | Dataset B | |
|---|---|---|
| Total characters | 11980 | 11907 |
| Distinct characters | 60 | 60 |
| Distinct categories | 7 | 7 ? |
| Distinct scripts | 2 | 2 ? |
| Distinct blocks | 1 | 1 ? |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 446 | 446 ? |
| Unique (%) | 100.0% | 100.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | Holm, Mr. John Fredrik Alexander | Trout, Mrs. William H (Jessie L) |
| 2nd row | Davies, Master. John Morgan Jr | Karlsson, Mr. Nils August |
| 3rd row | Sage, Master. Thomas Henry | de Mulder, Mr. Theodore |
| 4th row | Madsen, Mr. Fridtjof Arne | McKane, Mr. Peter David |
| 5th row | Dahlberg, Miss. Gerda Ulrika | Alhomaki, Mr. Ilmari Rudolf |
| Value | Count | Frequency (%) |
| mr | 263 | 14.6% |
| miss | 86 | 4.8% |
| mrs | 63 | 3.5% |
| william | 35 | 1.9% |
| john | 24 | 1.3% |
| master | 24 | 1.3% |
| henry | 21 | 1.2% |
| james | 15 | 0.8% |
| charles | 12 | 0.7% |
| joseph | 11 | 0.6% |
| Other values (886) | 1248 |
| Value | Count | Frequency (%) |
| mr | 263 | 14.7% |
| miss | 89 | 5.0% |
| mrs | 63 | 3.5% |
| william | 31 | 1.7% |
| john | 26 | 1.5% |
| master | 18 | 1.0% |
| george | 17 | 0.9% |
| henry | 16 | 0.9% |
| charles | 14 | 0.8% |
| thomas | 13 | 0.7% |
| Other values (894) | 1241 |
Most occurring characters
| Value | Count | Frequency (%) |
| 1356 | 11.3% | |
| r | 998 | 8.3% |
| e | 883 | 7.4% |
| a | 821 | 6.9% |
| i | 642 | 5.4% |
| s | 640 | 5.3% |
| n | 635 | 5.3% |
| M | 571 | 4.8% |
| l | 522 | 4.4% |
| o | 507 | 4.2% |
| Other values (50) | 4405 |
| Value | Count | Frequency (%) |
| 1347 | 11.3% | |
| r | 956 | 8.0% |
| e | 854 | 7.2% |
| a | 816 | 6.9% |
| n | 657 | 5.5% |
| i | 654 | 5.5% |
| s | 644 | 5.4% |
| M | 563 | 4.7% |
| l | 541 | 4.5% |
| o | 510 | 4.3% |
| Other values (50) | 4365 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 7706 | |
| Uppercase Letter | 1816 | 15.2% |
| Space Separator | 1356 | 11.3% |
| Other Punctuation | 951 | 7.9% |
| Close Punctuation | 71 | 0.6% |
| Open Punctuation | 71 | 0.6% |
| Dash Punctuation | 9 | 0.1% |
| Value | Count | Frequency (%) |
| Lowercase Letter | 7665 | |
| Uppercase Letter | 1807 | 15.2% |
| Space Separator | 1347 | 11.3% |
| Other Punctuation | 944 | 7.9% |
| Close Punctuation | 68 | 0.6% |
| Open Punctuation | 68 | 0.6% |
| Dash Punctuation | 8 | 0.1% |
Most frequent character per category
Space Separator
| Value | Count | Frequency (%) |
| 1356 |
| Value | Count | Frequency (%) |
| 1347 |
Lowercase Letter
| Value | Count | Frequency (%) |
| r | 998 | |
| e | 883 | |
| a | 821 | |
| i | 642 | |
| s | 640 | |
| n | 635 | |
| l | 522 | 6.8% |
| o | 507 | 6.6% |
| t | 346 | 4.5% |
| h | 272 | 3.5% |
| Other values (16) | 1440 |
| Value | Count | Frequency (%) |
| r | 956 | |
| e | 854 | |
| a | 816 | |
| n | 657 | |
| i | 654 | |
| s | 644 | |
| l | 541 | 7.1% |
| o | 510 | 6.7% |
| t | 334 | 4.4% |
| h | 254 | 3.3% |
| Other values (16) | 1445 |
Uppercase Letter
| Value | Count | Frequency (%) |
| M | 571 | |
| J | 119 | 6.6% |
| A | 115 | 6.3% |
| H | 107 | 5.9% |
| C | 96 | 5.3% |
| S | 93 | 5.1% |
| E | 84 | 4.6% |
| W | 76 | 4.2% |
| B | 68 | 3.7% |
| L | 55 | 3.0% |
| Other values (15) | 432 |
| Value | Count | Frequency (%) |
| M | 563 | |
| A | 111 | 6.1% |
| J | 107 | 5.9% |
| H | 97 | 5.4% |
| C | 87 | 4.8% |
| S | 87 | 4.8% |
| E | 83 | 4.6% |
| B | 70 | 3.9% |
| L | 70 | 3.9% |
| W | 66 | 3.7% |
| Other values (15) | 466 |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 447 | |
| , | 446 | |
| " | 52 | 5.5% |
| ' | 5 | 0.5% |
| / | 1 | 0.1% |
| Value | Count | Frequency (%) |
| . | 446 | |
| , | 446 | |
| " | 48 | 5.1% |
| ' | 3 | 0.3% |
| / | 1 | 0.1% |
Close Punctuation
| Value | Count | Frequency (%) |
| ) | 71 |
| Value | Count | Frequency (%) |
| ) | 68 |
Open Punctuation
| Value | Count | Frequency (%) |
| ( | 71 |
| Value | Count | Frequency (%) |
| ( | 68 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 9 |
| Value | Count | Frequency (%) |
| - | 8 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 9522 | |
| Common | 2458 | 20.5% |
| Value | Count | Frequency (%) |
| Latin | 9472 | |
| Common | 2435 | 20.5% |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 1356 | ||
| . | 447 | 18.2% |
| , | 446 | 18.1% |
| ) | 71 | 2.9% |
| ( | 71 | 2.9% |
| " | 52 | 2.1% |
| - | 9 | 0.4% |
| ' | 5 | 0.2% |
| / | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 1347 | ||
| . | 446 | 18.3% |
| , | 446 | 18.3% |
| ) | 68 | 2.8% |
| ( | 68 | 2.8% |
| " | 48 | 2.0% |
| - | 8 | 0.3% |
| ' | 3 | 0.1% |
| / | 1 | < 0.1% |
Latin
| Value | Count | Frequency (%) |
| r | 998 | 10.5% |
| e | 883 | 9.3% |
| a | 821 | 8.6% |
| i | 642 | 6.7% |
| s | 640 | 6.7% |
| n | 635 | 6.7% |
| M | 571 | 6.0% |
| l | 522 | 5.5% |
| o | 507 | 5.3% |
| t | 346 | 3.6% |
| Other values (41) | 2957 |
| Value | Count | Frequency (%) |
| r | 956 | 10.1% |
| e | 854 | 9.0% |
| a | 816 | 8.6% |
| n | 657 | 6.9% |
| i | 654 | 6.9% |
| s | 644 | 6.8% |
| M | 563 | 5.9% |
| l | 541 | 5.7% |
| o | 510 | 5.4% |
| t | 334 | 3.5% |
| Other values (41) | 2943 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 11980 |
| Value | Count | Frequency (%) |
| ASCII | 11907 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 1356 | 11.3% | |
| r | 998 | 8.3% |
| e | 883 | 7.4% |
| a | 821 | 6.9% |
| i | 642 | 5.4% |
| s | 640 | 5.3% |
| n | 635 | 5.3% |
| M | 571 | 4.8% |
| l | 522 | 4.4% |
| o | 507 | 4.2% |
| Other values (50) | 4405 |
| Value | Count | Frequency (%) |
| 1347 | 11.3% | |
| r | 956 | 8.0% |
| e | 854 | 7.2% |
| a | 816 | 6.9% |
| n | 657 | 5.5% |
| i | 654 | 5.5% |
| s | 644 | 5.4% |
| M | 563 | 4.7% |
| l | 541 | 4.5% |
| o | 510 | 4.3% |
| Other values (50) | 4365 |
Sex
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 2 | 2 |
| Distinct (%) | 0.4% | 0.4% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
| male | |
|---|---|
| female |
| male | |
|---|---|
| female |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 6 | 6 |
| Median length | 4 | 4 |
| Mean length | 4.67713 | 4.690583 |
| Min length | 4 | 4 |
Characters and Unicode
| Dataset A | Dataset B | |
|---|---|---|
| Total characters | 2086 | 2092 |
| Distinct characters | 5 | 5 |
| Distinct categories | 1 | 1 ? |
| Distinct scripts | 1 | 1 ? |
| Distinct blocks | 1 | 1 ? |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | male | female |
| 2nd row | male | male |
| 3rd row | male | male |
| 4th row | male | male |
| 5th row | female | male |
Common Values
| Value | Count | Frequency (%) |
| male | 295 | |
| female | 151 |
| Value | Count | Frequency (%) |
| male | 292 | |
| female | 154 |
Length
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| male | 295 | |
| female | 151 |
| Value | Count | Frequency (%) |
| male | 292 | |
| female | 154 |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 597 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 151 | 7.2% |
| Value | Count | Frequency (%) |
| e | 600 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 154 | 7.4% |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 2086 |
| Value | Count | Frequency (%) |
| Lowercase Letter | 2092 |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 597 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 151 | 7.2% |
| Value | Count | Frequency (%) |
| e | 600 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 154 | 7.4% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 2086 |
| Value | Count | Frequency (%) |
| Latin | 2092 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| e | 597 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 151 | 7.2% |
| Value | Count | Frequency (%) |
| e | 600 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 154 | 7.4% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 2086 |
| Value | Count | Frequency (%) |
| ASCII | 2092 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| e | 597 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 151 | 7.2% |
| Value | Count | Frequency (%) |
| e | 600 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 154 | 7.4% |
Age
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 69 | 80 |
| Distinct (%) | 19.9% | 22.7% |
| Missing | 99 | 93 |
| Missing (%) | 22.2% | 20.9% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 29.211816 | 29.832861 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0.42 | 0.42 |
| Maximum | 64 | 80 |
| Zeros | 0 | 0 |
| Zeros (%) | 0.0% | 0.0% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0.42 | 0.42 |
| 5-th percentile | 4 | 3.6 |
| Q1 | 20 | 20 |
| median | 28 | 28.5 |
| Q3 | 38.5 | 39 |
| 95-th percentile | 53.7 | 58 |
| Maximum | 64 | 80 |
| Range | 63.58 | 79.58 |
| Interquartile range (IQR) | 18.5 | 19 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 13.721645 | 14.792112 |
| Coefficient of variation (CV) | 0.46972929 | 0.49583283 |
| Kurtosis | -0.24656624 | 0.20666063 |
| Mean | 29.211816 | 29.832861 |
| Median Absolute Deviation (MAD) | 9 | 9.5 |
| Skewness | 0.12739559 | 0.36664262 |
| Sum | 10136.5 | 10531 |
| Variance | 188.28355 | 218.80658 |
| Monotonicity | Not monotonic | Not monotonic |
| Value | Count | Frequency (%) |
| 25 | 14 | 3.1% |
| 21 | 13 | 2.9% |
| 19 | 13 | 2.9% |
| 30 | 13 | 2.9% |
| 22 | 12 | 2.7% |
| 27 | 12 | 2.7% |
| 24 | 11 | 2.5% |
| 20 | 11 | 2.5% |
| 29 | 10 | 2.2% |
| 36 | 10 | 2.2% |
| Other values (59) | 228 | |
| (Missing) | 99 |
| Value | Count | Frequency (%) |
| 30 | 16 | 3.6% |
| 19 | 15 | 3.4% |
| 28 | 14 | 3.1% |
| 22 | 13 | 2.9% |
| 18 | 13 | 2.9% |
| 25 | 13 | 2.9% |
| 32 | 13 | 2.9% |
| 24 | 13 | 2.9% |
| 21 | 11 | 2.5% |
| 31 | 11 | 2.5% |
| Other values (70) | 221 | |
| (Missing) | 93 |
| Value | Count | Frequency (%) |
| 0.42 | 1 | 0.2% |
| 0.75 | 1 | 0.2% |
| 0.83 | 1 | 0.2% |
| 1 | 4 | |
| 2 | 6 | |
| 3 | 2 | 0.4% |
| 4 | 5 | |
| 5 | 2 | 0.4% |
| 7 | 3 | |
| 8 | 2 | 0.4% |
| Value | Count | Frequency (%) |
| 0.42 | 1 | 0.2% |
| 0.75 | 2 | 0.4% |
| 0.83 | 2 | 0.4% |
| 0.92 | 1 | 0.2% |
| 1 | 3 | |
| 2 | 6 | |
| 3 | 3 | |
| 4 | 5 | |
| 5 | 1 | 0.2% |
| 7 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0.42 | 1 | 0.2% |
| 0.75 | 2 | 0.4% |
| 0.83 | 2 | 0.4% |
| 0.92 | 1 | 0.2% |
| 1 | 3 | |
| 2 | 6 | |
| 3 | 3 | |
| 4 | 5 | |
| 5 | 1 | 0.2% |
| 7 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0.42 | 1 | 0.2% |
| 0.75 | 1 | 0.2% |
| 0.83 | 1 | 0.2% |
| 1 | 4 | |
| 2 | 6 | |
| 3 | 2 | 0.4% |
| 4 | 5 | |
| 5 | 2 | 0.4% |
| 7 | 3 | |
| 8 | 2 | 0.4% |
SibSp
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 7 | 7 |
| Distinct (%) | 1.6% | 1.6% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 0.54484305 | 0.48206278 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| Maximum | 8 | 8 |
| Zeros | 306 | 312 |
| Zeros (%) | 68.6% | 70.0% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| 5-th percentile | 0 | 0 |
| Q1 | 0 | 0 |
| median | 0 | 0 |
| Q3 | 1 | 1 |
| 95-th percentile | 2 | 2 |
| Maximum | 8 | 8 |
| Range | 8 | 8 |
| Interquartile range (IQR) | 1 | 1 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 1.2439545 | 1.04651 |
| Coefficient of variation (CV) | 2.2831428 | 2.1708998 |
| Kurtosis | 19.53937 | 19.864923 |
| Mean | 0.54484305 | 0.48206278 |
| Median Absolute Deviation (MAD) | 0 | 0 |
| Skewness | 4.0519772 | 3.8528422 |
| Sum | 243 | 215 |
| Variance | 1.5474228 | 1.0951832 |
| Monotonicity | Not monotonic | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 306 | |
| 1 | 105 | 23.5% |
| 2 | 13 | 2.9% |
| 8 | 7 | 1.6% |
| 4 | 7 | 1.6% |
| 3 | 6 | 1.3% |
| 5 | 2 | 0.4% |
| Value | Count | Frequency (%) |
| 0 | 312 | |
| 1 | 101 | 22.6% |
| 2 | 12 | 2.7% |
| 3 | 8 | 1.8% |
| 4 | 8 | 1.8% |
| 8 | 3 | 0.7% |
| 5 | 2 | 0.4% |
| Value | Count | Frequency (%) |
| 0 | 306 | |
| 1 | 105 | 23.5% |
| 2 | 13 | 2.9% |
| 3 | 6 | 1.3% |
| 4 | 7 | 1.6% |
| 5 | 2 | 0.4% |
| 8 | 7 | 1.6% |
| Value | Count | Frequency (%) |
| 0 | 312 | |
| 1 | 101 | 22.6% |
| 2 | 12 | 2.7% |
| 3 | 8 | 1.8% |
| 4 | 8 | 1.8% |
| 5 | 2 | 0.4% |
| 8 | 3 | 0.7% |
| Value | Count | Frequency (%) |
| 0 | 312 | |
| 1 | 101 | 22.6% |
| 2 | 12 | 2.7% |
| 3 | 8 | 1.8% |
| 4 | 8 | 1.8% |
| 5 | 2 | 0.4% |
| 8 | 3 | 0.7% |
| Value | Count | Frequency (%) |
| 0 | 306 | |
| 1 | 105 | 23.5% |
| 2 | 13 | 2.9% |
| 3 | 6 | 1.3% |
| 4 | 7 | 1.6% |
| 5 | 2 | 0.4% |
| 8 | 7 | 1.6% |
Parch
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 6 | 6 |
| Distinct (%) | 1.3% | 1.3% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 0.367713 | 0.36547085 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| Maximum | 5 | 5 |
| Zeros | 346 | 347 |
| Zeros (%) | 77.6% | 77.8% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| 5-th percentile | 0 | 0 |
| Q1 | 0 | 0 |
| median | 0 | 0 |
| Q3 | 0 | 0 |
| 95-th percentile | 2 | 2 |
| Maximum | 5 | 5 |
| Range | 5 | 5 |
| Interquartile range (IQR) | 0 | 0 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 0.81515845 | 0.81754394 |
| Coefficient of variation (CV) | 2.2168333 | 2.2369607 |
| Kurtosis | 10.240577 | 10.291075 |
| Mean | 0.367713 | 0.36547085 |
| Median Absolute Deviation (MAD) | 0 | 0 |
| Skewness | 2.8823125 | 2.9046286 |
| Sum | 164 | 163 |
| Variance | 0.6644833 | 0.66837809 |
| Monotonicity | Not monotonic | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 346 | |
| 1 | 54 | 12.1% |
| 2 | 38 | 8.5% |
| 5 | 4 | 0.9% |
| 4 | 2 | 0.4% |
| 3 | 2 | 0.4% |
| Value | Count | Frequency (%) |
| 0 | 347 | |
| 1 | 54 | 12.1% |
| 2 | 36 | 8.1% |
| 5 | 4 | 0.9% |
| 3 | 3 | 0.7% |
| 4 | 2 | 0.4% |
| Value | Count | Frequency (%) |
| 0 | 346 | |
| 1 | 54 | 12.1% |
| 2 | 38 | 8.5% |
| 3 | 2 | 0.4% |
| 4 | 2 | 0.4% |
| 5 | 4 | 0.9% |
| Value | Count | Frequency (%) |
| 0 | 347 | |
| 1 | 54 | 12.1% |
| 2 | 36 | 8.1% |
| 3 | 3 | 0.7% |
| 4 | 2 | 0.4% |
| 5 | 4 | 0.9% |
| Value | Count | Frequency (%) |
| 0 | 347 | |
| 1 | 54 | 12.1% |
| 2 | 36 | 8.1% |
| 3 | 3 | 0.7% |
| 4 | 2 | 0.4% |
| 5 | 4 | 0.9% |
| Value | Count | Frequency (%) |
| 0 | 346 | |
| 1 | 54 | 12.1% |
| 2 | 38 | 8.5% |
| 3 | 2 | 0.4% |
| 4 | 2 | 0.4% |
| 5 | 4 | 0.9% |
Ticket
['Text', 'Text']
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 389 | 388 |
| Distinct (%) | 87.2% | 87.0% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 18 | 18 |
| Median length | 17 | 17 |
| Mean length | 6.6995516 | 6.7623318 |
| Min length | 3 | 3 |
Characters and Unicode
| Dataset A | Dataset B | |
|---|---|---|
| Total characters | 2988 | 3016 |
| Distinct characters | 35 | 32 |
| Distinct categories | 5 | 5 ? |
| Distinct scripts | 2 | 2 ? |
| Distinct blocks | 1 | 1 ? |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 344 | 344 ? |
| Unique (%) | 77.1% | 77.1% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | C 7075 | 240929 |
| 2nd row | C.A. 33112 | 350060 |
| 3rd row | CA. 2343 | 345774 |
| 4th row | C 17369 | 28403 |
| 5th row | 7552 | SOTON/O2 3101287 |
| Value | Count | Frequency (%) |
| pc | 29 | 5.2% |
| c.a | 12 | 2.1% |
| ca | 9 | 1.6% |
| a/5 | 7 | 1.2% |
| 2343 | 7 | 1.2% |
| 2 | 7 | 1.2% |
| ston/o | 7 | 1.2% |
| 382652 | 5 | 0.9% |
| soton/o.q | 5 | 0.9% |
| w./c | 4 | 0.7% |
| Other values (410) | 470 |
| Value | Count | Frequency (%) |
| pc | 26 | 4.7% |
| c.a | 9 | 1.6% |
| a/5 | 7 | 1.3% |
| 2 | 7 | 1.3% |
| ston/o | 7 | 1.3% |
| ca | 6 | 1.1% |
| 347082 | 6 | 1.1% |
| soton/o.q | 5 | 0.9% |
| soton/oq | 5 | 0.9% |
| ston/o2 | 5 | 0.9% |
| Other values (407) | 476 |
Most occurring characters
| Value | Count | Frequency (%) |
| 3 | 390 | |
| 1 | 337 | |
| 2 | 293 | |
| 7 | 257 | |
| 6 | 238 | |
| 4 | 210 | 7.0% |
| 0 | 195 | 6.5% |
| 5 | 194 | 6.5% |
| 9 | 159 | 5.3% |
| 8 | 129 | 4.3% |
| Other values (25) | 586 |
| Value | Count | Frequency (%) |
| 3 | 390 | |
| 1 | 336 | |
| 2 | 296 | |
| 7 | 233 | 7.7% |
| 4 | 230 | 7.6% |
| 6 | 222 | 7.4% |
| 0 | 208 | 6.9% |
| 5 | 193 | 6.4% |
| 9 | 165 | 5.5% |
| 8 | 145 | 4.8% |
| Other values (22) | 598 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 2402 | |
| Uppercase Letter | 316 | 10.6% |
| Other Punctuation | 142 | 4.8% |
| Space Separator | 116 | 3.9% |
| Lowercase Letter | 12 | 0.4% |
| Value | Count | Frequency (%) |
| Decimal Number | 2418 | |
| Uppercase Letter | 323 | 10.7% |
| Other Punctuation | 153 | 5.1% |
| Space Separator | 113 | 3.7% |
| Lowercase Letter | 9 | 0.3% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 3 | 390 | |
| 1 | 337 | |
| 2 | 293 | |
| 7 | 257 | |
| 6 | 238 | |
| 4 | 210 | |
| 0 | 195 | |
| 5 | 194 | |
| 9 | 159 | |
| 8 | 129 | 5.4% |
| Value | Count | Frequency (%) |
| 3 | 390 | |
| 1 | 336 | |
| 2 | 296 | |
| 7 | 233 | |
| 4 | 230 | |
| 6 | 222 | |
| 0 | 208 | |
| 5 | 193 | |
| 9 | 165 | |
| 8 | 145 | 6.0% |
Space Separator
| Value | Count | Frequency (%) |
| 116 |
| Value | Count | Frequency (%) |
| 113 |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 94 | |
| / | 48 |
| Value | Count | Frequency (%) |
| . | 102 | |
| / | 51 |
Uppercase Letter
| Value | Count | Frequency (%) |
| C | 75 | |
| O | 48 | |
| P | 42 | |
| A | 40 | |
| S | 37 | |
| N | 19 | 6.0% |
| T | 17 | 5.4% |
| W | 9 | 2.8% |
| Q | 8 | 2.5% |
| I | 5 | 1.6% |
| Other values (6) | 16 | 5.1% |
| Value | Count | Frequency (%) |
| O | 65 | |
| C | 63 | |
| P | 42 | |
| S | 40 | |
| A | 33 | |
| N | 25 | 7.7% |
| T | 23 | 7.1% |
| Q | 10 | 3.1% |
| W | 7 | 2.2% |
| F | 5 | 1.5% |
| Other values (5) | 10 | 3.1% |
Lowercase Letter
| Value | Count | Frequency (%) |
| s | 3 | |
| a | 3 | |
| r | 2 | |
| i | 2 | |
| l | 1 | 8.3% |
| e | 1 | 8.3% |
| Value | Count | Frequency (%) |
| a | 3 | |
| r | 2 | |
| i | 2 | |
| s | 2 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 2660 | |
| Latin | 328 | 11.0% |
| Value | Count | Frequency (%) |
| Common | 2684 | |
| Latin | 332 | 11.0% |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 3 | 390 | |
| 1 | 337 | |
| 2 | 293 | |
| 7 | 257 | |
| 6 | 238 | |
| 4 | 210 | |
| 0 | 195 | |
| 5 | 194 | |
| 9 | 159 | |
| 8 | 129 | 4.8% |
| Other values (3) | 258 |
| Value | Count | Frequency (%) |
| 3 | 390 | |
| 1 | 336 | |
| 2 | 296 | |
| 7 | 233 | |
| 4 | 230 | |
| 6 | 222 | |
| 0 | 208 | |
| 5 | 193 | |
| 9 | 165 | |
| 8 | 145 | 5.4% |
| Other values (3) | 266 |
Latin
| Value | Count | Frequency (%) |
| C | 75 | |
| O | 48 | |
| P | 42 | |
| A | 40 | |
| S | 37 | |
| N | 19 | 5.8% |
| T | 17 | 5.2% |
| W | 9 | 2.7% |
| Q | 8 | 2.4% |
| I | 5 | 1.5% |
| Other values (12) | 28 | 8.5% |
| Value | Count | Frequency (%) |
| O | 65 | |
| C | 63 | |
| P | 42 | |
| S | 40 | |
| A | 33 | |
| N | 25 | 7.5% |
| T | 23 | 6.9% |
| Q | 10 | 3.0% |
| W | 7 | 2.1% |
| F | 5 | 1.5% |
| Other values (9) | 19 | 5.7% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 2988 |
| Value | Count | Frequency (%) |
| ASCII | 3016 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 3 | 390 | |
| 1 | 337 | |
| 2 | 293 | |
| 7 | 257 | |
| 6 | 238 | |
| 4 | 210 | 7.0% |
| 0 | 195 | 6.5% |
| 5 | 194 | 6.5% |
| 9 | 159 | 5.3% |
| 8 | 129 | 4.3% |
| Other values (25) | 586 |
| Value | Count | Frequency (%) |
| 3 | 390 | |
| 1 | 336 | |
| 2 | 296 | |
| 7 | 233 | 7.7% |
| 4 | 230 | 7.6% |
| 6 | 222 | 7.4% |
| 0 | 208 | 6.9% |
| 5 | 193 | 6.4% |
| 9 | 165 | 5.5% |
| 8 | 145 | 4.8% |
| Other values (22) | 598 |
Fare
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 188 | 181 |
| Distinct (%) | 42.2% | 40.6% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 31.985706 | 28.221 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| Maximum | 512.3292 | 263 |
| Zeros | 6 | 6 |
| Zeros (%) | 1.3% | 1.3% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| 5-th percentile | 7.162525 | 7.225 |
| Q1 | 7.9031 | 7.8958 |
| median | 13.89585 | 13.68125 |
| Q3 | 30.0708 | 30.0708 |
| 95-th percentile | 120 | 90 |
| Maximum | 512.3292 | 263 |
| Range | 512.3292 | 263 |
| Interquartile range (IQR) | 22.1677 | 22.175 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 50.411634 | 34.576378 |
| Coefficient of variation (CV) | 1.5760676 | 1.2252003 |
| Kurtosis | 39.132238 | 13.121447 |
| Mean | 31.985706 | 28.221 |
| Median Absolute Deviation (MAD) | 6.64585 | 6.13125 |
| Skewness | 5.1942457 | 3.0872007 |
| Sum | 14265.625 | 12586.566 |
| Variance | 2541.3328 | 1195.5259 |
| Monotonicity | Not monotonic | Not monotonic |
| Value | Count | Frequency (%) |
| 13 | 21 | 4.7% |
| 7.8958 | 21 | 4.7% |
| 8.05 | 20 | 4.5% |
| 26 | 17 | 3.8% |
| 7.75 | 17 | 3.8% |
| 10.5 | 11 | 2.5% |
| 7.925 | 9 | 2.0% |
| 26.55 | 7 | 1.6% |
| 7.2292 | 7 | 1.6% |
| 8.6625 | 7 | 1.6% |
| Other values (178) | 309 |
| Value | Count | Frequency (%) |
| 8.05 | 28 | 6.3% |
| 13 | 24 | 5.4% |
| 7.75 | 23 | 5.2% |
| 7.8958 | 22 | 4.9% |
| 26 | 15 | 3.4% |
| 7.925 | 10 | 2.2% |
| 7.775 | 8 | 1.8% |
| 7.2292 | 7 | 1.6% |
| 10.5 | 7 | 1.6% |
| 7.8542 | 7 | 1.6% |
| Other values (171) | 295 |
| Value | Count | Frequency (%) |
| 0 | 6 | |
| 4.0125 | 1 | 0.2% |
| 5 | 1 | 0.2% |
| 6.4375 | 1 | 0.2% |
| 6.45 | 1 | 0.2% |
| 6.4958 | 1 | 0.2% |
| 6.75 | 1 | 0.2% |
| 6.8583 | 1 | 0.2% |
| 6.95 | 1 | 0.2% |
| 6.975 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 6 | |
| 6.2375 | 1 | 0.2% |
| 6.4375 | 1 | 0.2% |
| 6.45 | 1 | 0.2% |
| 6.75 | 1 | 0.2% |
| 6.8583 | 1 | 0.2% |
| 6.95 | 1 | 0.2% |
| 6.975 | 2 | 0.4% |
| 7.05 | 4 | |
| 7.125 | 3 |
| Value | Count | Frequency (%) |
| 0 | 6 | |
| 6.2375 | 1 | 0.2% |
| 6.4375 | 1 | 0.2% |
| 6.45 | 1 | 0.2% |
| 6.75 | 1 | 0.2% |
| 6.8583 | 1 | 0.2% |
| 6.95 | 1 | 0.2% |
| 6.975 | 2 | 0.4% |
| 7.05 | 4 | |
| 7.125 | 3 |
| Value | Count | Frequency (%) |
| 0 | 6 | |
| 4.0125 | 1 | 0.2% |
| 5 | 1 | 0.2% |
| 6.4375 | 1 | 0.2% |
| 6.45 | 1 | 0.2% |
| 6.4958 | 1 | 0.2% |
| 6.75 | 1 | 0.2% |
| 6.8583 | 1 | 0.2% |
| 6.95 | 1 | 0.2% |
| 6.975 | 1 | 0.2% |
Cabin
['Text', 'Text']
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 85 | 83 |
| Distinct (%) | 88.5% | 87.4% |
| Missing | 350 | 351 |
| Missing (%) | 78.5% | 78.7% |
| Memory size | 7.0 KiB | 7.0 KiB |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 11 | 11 |
| Median length | 3 | 3 |
| Mean length | 3.4791667 | 3.3789474 |
| Min length | 1 | 1 |
Characters and Unicode
| Dataset A | Dataset B | |
|---|---|---|
| Total characters | 334 | 321 |
| Distinct characters | 18 | 18 |
| Distinct categories | 3 | 3 ? |
| Distinct scripts | 2 | 2 ? |
| Distinct blocks | 1 | 1 ? |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 75 | 71 ? |
| Unique (%) | 78.1% | 74.7% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | D49 | A10 |
| 2nd row | E8 | E33 |
| 3rd row | B77 | D17 |
| 4th row | B5 | E49 |
| 5th row | C103 | C93 |
| Value | Count | Frequency (%) |
| b96 | 3 | 2.8% |
| b98 | 3 | 2.8% |
| e101 | 2 | 1.8% |
| e8 | 2 | 1.8% |
| d | 2 | 1.8% |
| c124 | 2 | 1.8% |
| e44 | 2 | 1.8% |
| g6 | 2 | 1.8% |
| c26 | 2 | 1.8% |
| c22 | 2 | 1.8% |
| Other values (85) | 87 |
| Value | Count | Frequency (%) |
| c22 | 2 | 1.9% |
| b58 | 2 | 1.9% |
| c26 | 2 | 1.9% |
| c93 | 2 | 1.9% |
| b20 | 2 | 1.9% |
| e33 | 2 | 1.9% |
| e101 | 2 | 1.9% |
| b60 | 2 | 1.9% |
| e44 | 2 | 1.9% |
| c126 | 2 | 1.9% |
| Other values (79) | 84 |
Most occurring characters
| Value | Count | Frequency (%) |
| 1 | 33 | 9.9% |
| B | 32 | 9.6% |
| C | 31 | 9.3% |
| 3 | 29 | 8.7% |
| 2 | 25 | 7.5% |
| 4 | 24 | 7.2% |
| D | 21 | 6.3% |
| 6 | 20 | 6.0% |
| 9 | 19 | 5.7% |
| 0 | 17 | 5.1% |
| Other values (8) | 83 |
| Value | Count | Frequency (%) |
| C | 32 | 10.0% |
| 2 | 32 | 10.0% |
| 1 | 28 | 8.7% |
| 3 | 24 | 7.5% |
| B | 23 | 7.2% |
| 6 | 22 | 6.9% |
| 4 | 22 | 6.9% |
| 8 | 20 | 6.2% |
| E | 19 | 5.9% |
| 5 | 18 | 5.6% |
| Other values (8) | 81 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 212 | |
| Uppercase Letter | 109 | |
| Space Separator | 13 | 3.9% |
| Value | Count | Frequency (%) |
| Decimal Number | 208 | |
| Uppercase Letter | 104 | |
| Space Separator | 9 | 2.8% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 1 | 33 | |
| 3 | 29 | |
| 2 | 25 | |
| 4 | 24 | |
| 6 | 20 | |
| 9 | 19 | |
| 0 | 17 | |
| 8 | 16 | |
| 5 | 15 | |
| 7 | 14 |
| Value | Count | Frequency (%) |
| 2 | 32 | |
| 1 | 28 | |
| 3 | 24 | |
| 6 | 22 | |
| 4 | 22 | |
| 8 | 20 | |
| 5 | 18 | |
| 0 | 17 | |
| 9 | 13 | |
| 7 | 12 | 5.8% |
Uppercase Letter
| Value | Count | Frequency (%) |
| B | 32 | |
| C | 31 | |
| D | 21 | |
| E | 12 | 11.0% |
| A | 6 | 5.5% |
| F | 4 | 3.7% |
| G | 3 | 2.8% |
| Value | Count | Frequency (%) |
| C | 32 | |
| B | 23 | |
| E | 19 | |
| D | 17 | |
| A | 8 | 7.7% |
| F | 4 | 3.8% |
| G | 1 | 1.0% |
Space Separator
| Value | Count | Frequency (%) |
| 13 |
| Value | Count | Frequency (%) |
| 9 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 225 | |
| Latin | 109 |
| Value | Count | Frequency (%) |
| Common | 217 | |
| Latin | 104 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 1 | 33 | |
| 3 | 29 | |
| 2 | 25 | |
| 4 | 24 | |
| 6 | 20 | |
| 9 | 19 | |
| 0 | 17 | |
| 8 | 16 | |
| 5 | 15 | |
| 7 | 14 |
| Value | Count | Frequency (%) |
| 2 | 32 | |
| 1 | 28 | |
| 3 | 24 | |
| 6 | 22 | |
| 4 | 22 | |
| 8 | 20 | |
| 5 | 18 | |
| 0 | 17 | |
| 9 | 13 | |
| 7 | 12 | 5.5% |
Latin
| Value | Count | Frequency (%) |
| B | 32 | |
| C | 31 | |
| D | 21 | |
| E | 12 | 11.0% |
| A | 6 | 5.5% |
| F | 4 | 3.7% |
| G | 3 | 2.8% |
| Value | Count | Frequency (%) |
| C | 32 | |
| B | 23 | |
| E | 19 | |
| D | 17 | |
| A | 8 | 7.7% |
| F | 4 | 3.8% |
| G | 1 | 1.0% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 334 |
| Value | Count | Frequency (%) |
| ASCII | 321 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 1 | 33 | 9.9% |
| B | 32 | 9.6% |
| C | 31 | 9.3% |
| 3 | 29 | 8.7% |
| 2 | 25 | 7.5% |
| 4 | 24 | 7.2% |
| D | 21 | 6.3% |
| 6 | 20 | 6.0% |
| 9 | 19 | 5.7% |
| 0 | 17 | 5.1% |
| Other values (8) | 83 |
| Value | Count | Frequency (%) |
| C | 32 | 10.0% |
| 2 | 32 | 10.0% |
| 1 | 28 | 8.7% |
| 3 | 24 | 7.5% |
| B | 23 | 7.2% |
| 6 | 22 | 6.9% |
| 4 | 22 | 6.9% |
| 8 | 20 | 6.2% |
| E | 19 | 5.9% |
| 5 | 18 | 5.6% |
| Other values (8) | 81 |
Embarked
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 3 | 3 |
| Distinct (%) | 0.7% | 0.7% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
| S | |
|---|---|
| C | |
| Q |
| S | |
|---|---|
| C | |
| Q |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 1 | 1 |
| Median length | 1 | 1 |
| Mean length | 1 | 1 |
| Min length | 1 | 1 |
Characters and Unicode
| Dataset A | Dataset B | |
|---|---|---|
| Total characters | 446 | 446 |
| Distinct characters | 3 | 3 |
| Distinct categories | 1 | 1 ? |
| Distinct scripts | 1 | 1 ? |
| Distinct blocks | 1 | 1 ? |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | S | S |
| 2nd row | S | S |
| 3rd row | S | S |
| 4th row | S | S |
| 5th row | S | S |
Common Values
| Value | Count | Frequency (%) |
| S | 317 | |
| C | 87 | 19.5% |
| Q | 42 | 9.4% |
| Value | Count | Frequency (%) |
| S | 324 | |
| C | 78 | 17.5% |
| Q | 44 | 9.9% |
Length
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| s | 317 | |
| c | 87 | 19.5% |
| q | 42 | 9.4% |
| Value | Count | Frequency (%) |
| s | 324 | |
| c | 78 | 17.5% |
| q | 44 | 9.9% |
Most occurring characters
| Value | Count | Frequency (%) |
| S | 317 | |
| C | 87 | 19.5% |
| Q | 42 | 9.4% |
| Value | Count | Frequency (%) |
| S | 324 | |
| C | 78 | 17.5% |
| Q | 44 | 9.9% |
Most occurring categories
| Value | Count | Frequency (%) |
| Uppercase Letter | 446 |
| Value | Count | Frequency (%) |
| Uppercase Letter | 446 |
Most frequent character per category
Uppercase Letter
| Value | Count | Frequency (%) |
| S | 317 | |
| C | 87 | 19.5% |
| Q | 42 | 9.4% |
| Value | Count | Frequency (%) |
| S | 324 | |
| C | 78 | 17.5% |
| Q | 44 | 9.9% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 446 |
| Value | Count | Frequency (%) |
| Latin | 446 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| S | 317 | |
| C | 87 | 19.5% |
| Q | 42 | 9.4% |
| Value | Count | Frequency (%) |
| S | 324 | |
| C | 78 | 17.5% |
| Q | 44 | 9.9% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 446 |
| Value | Count | Frequency (%) |
| ASCII | 446 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| S | 317 | |
| C | 87 | 19.5% |
| Q | 42 | 9.4% |
| Value | Count | Frequency (%) |
| S | 324 | |
| C | 78 | 17.5% |
| Q | 44 | 9.9% |
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
| PassengerId | Age | SibSp | Parch | Fare | Survived | Pclass | Sex | Embarked | |
|---|---|---|---|---|---|---|---|---|---|
| PassengerId | 1.000 | 0.016 | -0.040 | 0.005 | 0.012 | 0.000 | 0.000 | 0.128 | 0.000 |
| Age | 0.016 | 1.000 | -0.142 | -0.219 | 0.218 | 0.084 | 0.328 | 0.000 | 0.000 |
| SibSp | -0.040 | -0.142 | 1.000 | 0.377 | 0.461 | 0.207 | 0.154 | 0.172 | 0.152 |
| Parch | 0.005 | -0.219 | 0.377 | 1.000 | 0.401 | 0.138 | 0.059 | 0.237 | 0.058 |
| Fare | 0.012 | 0.218 | 0.461 | 0.401 | 1.000 | 0.267 | 0.480 | 0.211 | 0.171 |
| Survived | 0.000 | 0.084 | 0.207 | 0.138 | 0.267 | 1.000 | 0.319 | 0.525 | 0.192 |
| Pclass | 0.000 | 0.328 | 0.154 | 0.059 | 0.480 | 0.319 | 1.000 | 0.117 | 0.250 |
| Sex | 0.128 | 0.000 | 0.172 | 0.237 | 0.211 | 0.525 | 0.117 | 1.000 | 0.115 |
| Embarked | 0.000 | 0.000 | 0.152 | 0.058 | 0.171 | 0.192 | 0.250 | 0.115 | 1.000 |
Dataset B
| PassengerId | Age | SibSp | Parch | Fare | Survived | Pclass | Sex | Embarked | |
|---|---|---|---|---|---|---|---|---|---|
| PassengerId | 1.000 | 0.013 | -0.038 | -0.014 | -0.008 | 0.105 | 0.000 | 0.000 | 0.000 |
| Age | 0.013 | 1.000 | -0.137 | -0.241 | 0.144 | 0.163 | 0.245 | 0.124 | 0.114 |
| SibSp | -0.038 | -0.137 | 1.000 | 0.416 | 0.489 | 0.184 | 0.159 | 0.218 | 0.076 |
| Parch | -0.014 | -0.241 | 0.416 | 1.000 | 0.404 | 0.179 | 0.052 | 0.228 | 0.081 |
| Fare | -0.008 | 0.144 | 0.489 | 0.404 | 1.000 | 0.235 | 0.548 | 0.124 | 0.240 |
| Survived | 0.105 | 0.163 | 0.184 | 0.179 | 0.235 | 1.000 | 0.338 | 0.501 | 0.120 |
| Pclass | 0.000 | 0.245 | 0.159 | 0.052 | 0.548 | 0.338 | 1.000 | 0.114 | 0.255 |
| Sex | 0.000 | 0.124 | 0.218 | 0.228 | 0.124 | 0.501 | 0.114 | 1.000 | 0.096 |
| Embarked | 0.000 | 0.114 | 0.076 | 0.081 | 0.240 | 0.120 | 0.255 | 0.096 | 1.000 |
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 818 | 819 | 0 | 3 | Holm, Mr. John Fredrik Alexander | male | 43.0 | 0 | 0 | C 7075 | 6.4500 | NaN | S |
| 549 | 550 | 1 | 2 | Davies, Master. John Morgan Jr | male | 8.0 | 1 | 1 | C.A. 33112 | 36.7500 | NaN | S |
| 159 | 160 | 0 | 3 | Sage, Master. Thomas Henry | male | NaN | 8 | 2 | CA. 2343 | 69.5500 | NaN | S |
| 127 | 128 | 1 | 3 | Madsen, Mr. Fridtjof Arne | male | 24.0 | 0 | 0 | C 17369 | 7.1417 | NaN | S |
| 882 | 883 | 0 | 3 | Dahlberg, Miss. Gerda Ulrika | female | 22.0 | 0 | 0 | 7552 | 10.5167 | NaN | S |
| 722 | 723 | 0 | 2 | Gillespie, Mr. William Henry | male | 34.0 | 0 | 0 | 12233 | 13.0000 | NaN | S |
| 797 | 798 | 1 | 3 | Osman, Mrs. Mara | female | 31.0 | 0 | 0 | 349244 | 8.6833 | NaN | S |
| 14 | 15 | 0 | 3 | Vestrom, Miss. Hulda Amanda Adolfina | female | 14.0 | 0 | 0 | 350406 | 7.8542 | NaN | S |
| 603 | 604 | 0 | 3 | Torber, Mr. Ernst William | male | 44.0 | 0 | 0 | 364511 | 8.0500 | NaN | S |
| 517 | 518 | 0 | 3 | Ryan, Mr. Patrick | male | NaN | 0 | 0 | 371110 | 24.1500 | NaN | Q |
Dataset B
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 399 | 400 | 1 | 2 | Trout, Mrs. William H (Jessie L) | female | 28.0 | 0 | 0 | 240929 | 12.6500 | NaN | S |
| 478 | 479 | 0 | 3 | Karlsson, Mr. Nils August | male | 22.0 | 0 | 0 | 350060 | 7.5208 | NaN | S |
| 286 | 287 | 1 | 3 | de Mulder, Mr. Theodore | male | 30.0 | 0 | 0 | 345774 | 9.5000 | NaN | S |
| 397 | 398 | 0 | 2 | McKane, Mr. Peter David | male | 46.0 | 0 | 0 | 28403 | 26.0000 | NaN | S |
| 840 | 841 | 0 | 3 | Alhomaki, Mr. Ilmari Rudolf | male | 20.0 | 0 | 0 | SOTON/O2 3101287 | 7.9250 | NaN | S |
| 406 | 407 | 0 | 3 | Widegren, Mr. Carl/Charles Peter | male | 51.0 | 0 | 0 | 347064 | 7.7500 | NaN | S |
| 583 | 584 | 0 | 1 | Ross, Mr. John Hugo | male | 36.0 | 0 | 0 | 13049 | 40.1250 | A10 | C |
| 356 | 357 | 1 | 1 | Bowerman, Miss. Elsie Edith | female | 22.0 | 0 | 1 | 113505 | 55.0000 | E33 | S |
| 862 | 863 | 1 | 1 | Swift, Mrs. Frederick Joel (Margaret Welles Barron) | female | 48.0 | 0 | 0 | 17466 | 25.9292 | D17 | S |
| 324 | 325 | 0 | 3 | Sage, Mr. George John Jr | male | NaN | 8 | 2 | CA. 2343 | 69.5500 | NaN | S |
Dataset A
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 382 | 383 | 0 | 3 | Tikkanen, Mr. Juho | male | 32.0 | 0 | 0 | STON/O 2. 3101293 | 7.9250 | NaN | S |
| 31 | 32 | 1 | 1 | Spencer, Mrs. William Augustus (Marie Eugenie) | female | NaN | 1 | 0 | PC 17569 | 146.5208 | B78 | C |
| 453 | 454 | 1 | 1 | Goldenberg, Mr. Samuel L | male | 49.0 | 1 | 0 | 17453 | 89.1042 | C92 | C |
| 467 | 468 | 0 | 1 | Smart, Mr. John Montgomery | male | 56.0 | 0 | 0 | 113792 | 26.5500 | NaN | S |
| 251 | 252 | 0 | 3 | Strom, Mrs. Wilhelm (Elna Matilda Persson) | female | 29.0 | 1 | 1 | 347054 | 10.4625 | G6 | S |
| 195 | 196 | 1 | 1 | Lurette, Miss. Elise | female | 58.0 | 0 | 0 | PC 17569 | 146.5208 | B80 | C |
| 150 | 151 | 0 | 2 | Bateman, Rev. Robert James | male | 51.0 | 0 | 0 | S.O.P. 1166 | 12.5250 | NaN | S |
| 276 | 277 | 0 | 3 | Lindblom, Miss. Augusta Charlotta | female | 45.0 | 0 | 0 | 347073 | 7.7500 | NaN | S |
| 375 | 376 | 1 | 1 | Meyer, Mrs. Edgar Joseph (Leila Saks) | female | NaN | 1 | 0 | PC 17604 | 82.1708 | NaN | C |
| 479 | 480 | 1 | 3 | Hirvonen, Miss. Hildur E | female | 2.0 | 0 | 1 | 3101298 | 12.2875 | NaN | S |
Dataset B
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 857 | 858 | 1 | 1 | Daly, Mr. Peter Denis | male | 51.0 | 0 | 0 | 113055 | 26.5500 | E17 | S |
| 30 | 31 | 0 | 1 | Uruchurtu, Don. Manuel E | male | 40.0 | 0 | 0 | PC 17601 | 27.7208 | NaN | C |
| 40 | 41 | 0 | 3 | Ahlin, Mrs. Johan (Johanna Persdotter Larsson) | female | 40.0 | 1 | 0 | 7546 | 9.4750 | NaN | S |
| 370 | 371 | 1 | 1 | Harder, Mr. George Achilles | male | 25.0 | 1 | 0 | 11765 | 55.4417 | E50 | C |
| 660 | 661 | 1 | 1 | Frauenthal, Dr. Henry William | male | 50.0 | 2 | 0 | PC 17611 | 133.6500 | NaN | S |
| 681 | 682 | 1 | 1 | Hassab, Mr. Hammad | male | 27.0 | 0 | 0 | PC 17572 | 76.7292 | D49 | C |
| 237 | 238 | 1 | 2 | Collyer, Miss. Marjorie "Lottie" | female | 8.0 | 0 | 2 | C.A. 31921 | 26.2500 | NaN | S |
| 418 | 419 | 0 | 2 | Matthews, Mr. William John | male | 30.0 | 0 | 0 | 28228 | 13.0000 | NaN | S |
| 390 | 391 | 1 | 1 | Carter, Mr. William Ernest | male | 36.0 | 1 | 2 | 113760 | 120.0000 | B96 B98 | S |
| 639 | 640 | 0 | 3 | Thorneycroft, Mr. Percival | male | NaN | 1 | 0 | 376564 | 16.1000 | NaN | S |
Dataset A
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | # duplicates | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dataset does not contain duplicate rows. | |||||||||||||
Dataset B
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | # duplicates | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dataset does not contain duplicate rows. | |||||||||||||